Panels of mRNAs and miRNAs for decoding molecular mechanisms of Renal Cell Carcinoma (RCC) subtypes utilizing Artificial Intelligence approaches

Renal Cell Carcinoma (RCC) encompasses three histological subtypes, including clear cell RCC (KIRC), papillary RCC (KIRP), and chromophobe RCC (KICH) each of which has different clinical courses, genetic/epigenetic drivers, and therapeutic responses. This study aimed to identify the significant mRNAs and microRNA panels involved in the pathogenesis of RCC subtypes. The mRNA and microRNA transcripts profile were obtained from The Cancer Genome Atlas (TCGA), which were included 611 ccRCC patients, 321 pRCC patients, and 89 chRCC patients for mRNA data and 616 patients in the ccRCC subtype, 326 patients in the pRCC subtype, and 91 patients in the chRCC for miRNA data, respectively. To identify mRNAs and miRNAs, feature selection based on filter and graph algorithms was applied. Then, a deep model was used to classify the subtypes of the RCC. Finally, an association rule mining algorithm was used to disclose features with significant roles to trigger molecular mechanisms to cause RCC subtypes. Panels of 77 mRNAs and 73 miRNAs could discriminate the KIRC, KIRP, and KICH subtypes from each other with 92% (F1-score ≥ 0.9, AUC ≥ 0.89) and 95% accuracy (F1-score ≥ 0.93, AUC ≥ 0.95), respectively. The Association Rule Mining analysis could identify miR-28 (repeat count = 2642) and CSN7A (repeat count = 5794) along with the miR-125a (repeat count = 2591) and NMD3 (repeat count = 2306) with the highest repeat counts, in the KIRC and KIRP rules, respectively. This study found new panels of mRNAs and miRNAs to distinguish among RCC subtypes, which were able to provide new insights into the underlying responsible mechanisms for the initiation and progression of KIRC and KIRP. The proposed mRNA and miRNA panels have a high potential to be as biomarkers of RCC subtypes and should be examined in future clinical studies.


Classification
In this step, we applied a classifier to evaluate the candidate features selected in the previous step.
High accuracy (or any user-defined measure) of classification demonstrates feature selection method was succeeded in choosing the relevant attributes. Otherwise feature selection method cannot identify relevant features.
Classification is defined as a process of predicting the class of given data points using mathematical methods. It is a task that needs to use machine learning algorithms, in which the machine learns how to assign a class label to samples from the problem field. Classes are sometimes called targets, labels, or categories. The classification model (classifier) learns to do mapping function ( ) from features space ( ) to discrete output variables ( ) approximately. In this regard, the classifier applies input training data to predict the likelihood or probability in the data with predetermined categories. In machine learning, classification algorithms leverage a wide range of methods to classify datasets into correct categories.
In recent years, deep learning has been a new trend in the machine learning area, so it has succeeded in many applications with different domains [1]. Also, models based on deep learning have been widely applied in the health informatics field [2], such as translation bioinformatics, medical imaging, pervasive sensing for health, and medical informatics [3]. In this work, we employed a self-organizing deep auto-encoder model to classify data based on candidate features.
A self-organizing deep auto-encoder is a specific type of deep auto-encoder that can determine its structure automatically, including the number of neurons and layers [4]. Description of selforganizing deep auto-encoder is available in more detail in First, the training process of the deep model and the model selection was performed by training and validation data, respectively. Next, the performance of classification was estimated by employing test data. Table 1: Pseudocode of frequent itemset generation step in FP-Growth algorithm [5] Algorithm: Frequent itemset generation in FP-Growth algorithm Input: A database DB, represented by FP-tree constructed, and a minimum support threshold ξ. (1) if Tree contains a single prefix path, then: // Mining single prefix-path FP-tree (2) let P be the single prefix-path part of Tree;

Supplementary Figures
Supplementary Figure 1

Supplementary Discussion
Possible pathological roles of candidate transcripts in KIRC As a transcriptional cofactor, the PSMD9 can control the translation, transcription, and receptor/ hormone activity through its interaction with the S14 (a ribosomal protein), CSH1 (a growth hormone), E12 (a transcription factor), and IL6 receptor [7]. It may participate in the activin A signaling cascade and growth regulation of cancer cells [8]. The interaction of the PSMD9 with the hnRNPA1 (heterogeneous nuclear ribonucleoprotein A1) results in the degradation of IkBa and the activation of NF-κB [9]. The PSMD9 also plays a role in the preservation of integrity and morphology of nucleolus and indirectly supports the p53 degradation by decreasing the free cytoplasmic ribosomal proteins to prevent the MDM2 E3 ligase activity. These functions overcome the anti-cancer drug-induced nucleolar stress and achieve a survival benefit for cells with the PSMD9. The absence of the PSMD9 (the nucleoplasm reorganization of free ribosomal proteins, p53 stabilization, and inhibition of MDM2 E3 ligase activity) indirectly can impact cell survival and cell cycle regulation [10].
The 4 th identified top mRNA involved in the KIRC was basophilic leukemia-expressed protein (BLES03, C11orf68). The structure of this protein is similar to the eIF4E and its biological function is unknown. It might participate in a biochemical procedure that requires nucleic acids recognition [11]. An elevated gene expression of the BLES03 was seen in both primary tumors and cell lines of laryngeal squamous cell carcinoma due to the gain of copy number [12]. It is also reported that in metastatic prostate, liver, and breast cancer cell lines, the BLES03 is activated by hypomethylation [13]. It is indicated that the BLES03-like proteins can be novel eukaryotic phosphothreonine lyases that are involved in the dehydro amino acids biosynthesis [14]. The role of BLES03in the pathogenesis of KIRC needs to be investigated.
Epigenetic histone modifications, monoubiquitination of histone H2B (at lysine 120; H2Bub1), is connected with DNA damage response and active transcription. Human RING-finger protein 40 (RNF40) is an E3 ligase of H2B ubiquitination. Under DNA damage response, the interaction of p53 and RNF40/ RNF20/WAC complex regulates the transcription of genes [15]. The RNF40/20 complex also controls the p53-dependent mRNA splicing and gene transcription [16]. Moreover, this complex regulates DNA repair and chromatin stability and their aberrant expression causes replication stress and genomic instability, resulting in a dysregulated transcriptional program. By stimulating the NF-κB activity, the RNF40 has an essential role in the preservation of inflammatory signaling and tumorigenic features [17]. These data indicate that the RNF40 may be involved in the initial step of carcinogenesis [18]. However, it is not clear whether the RNF40 is a foe or friend in cancer. It may act as an oncogene in the liver, colorectal cancer [19], prostate cancer, and acute lymphoblastic leukemia [20] or a tumor-suppressor [21,22]. The role of the RNF40 in the pathogenesis of the KIRC needs to be determined. Since different tumor suppressors and oncogenes are controlled by the Ub-and proteosome-mediated degradation, the CSN7A, UBAC1, PSMD9, and RNF40 may play important roles in the pathogenesis of the KIRC. The roles of the Capn4, TMEMs, and CFL1 are previously reported in the KIRC [23][24][25].

Possible pathological roles of candidate transcripts in KIRP
Zinc finger proteins (ZFPs) are transcription factors that through binding to the RNA or DNA play significant roles in the regulation of transcription, DNA repair, cell migration, degradation of Ubprotein, and signal transduction [26]. They are implicated in the progression of cancer by controlling the transcription of genes involved in apoptosis, migration, proliferation, and invasion [27]. They have crucial functions in the initiation and development of cancers especially the RCC [28][29][30]. The ZNF41 is the second top mRNA identified in our deep-learning analysis. It is located on the X chromosome [31] and is associated with transcription regulation. The ZFP41 is identified as a cancer-related protein in malignant melanoma tissue [32]. In liver cancer, the ZFP41 gene is silenced via the hypermethylation of DNA [33]. The contribution of the ZNF41 in the pathogenesis of the KIRP may be associated with modifying cancer signaling pathways and its functions need to be studied.
Cilia and flagella-associated protein 36 (CFAP36 or CCDC104) was the third top identified mRNA in the KIRP. The CFAP36 is a binding partner of the Arl3, a small GTPase in the primary cilia [34].
The primary cilia, an antenna-like structure on the cells, is involved in the regulation of the cell cycle and organ homeostasis. The CFAP36 may also participate in cellular cilia formation and mitosis of cancer cells [35][36][37]. In the renal epithelium, the formation, function, and preservation of primary cilia are affected by the pVHL [38]. The loss of cilia is a common event underlying tumorigenesis in several subtypes of the RCC especially the KIRC, suggesting the role of nonciliated cells in cancer development [39]. In the KIRP with functioning pVHL, high frequencies of cilia were seen, indicating that this subtype can develop without disassembling its primary cilia.
Evidence has unveiled reciprocal relations between the NRF2 and primary cilia. By inducing autophagy, primary cilia downregulate the NRF2 activity. On the other hand, the NRF2 controls genes complicated in Hedgehog (Hh) signaling and ciliogenesis transcriptionally, tumorigenesis, and stem cell function; however, the NRF2 may also impact these processes negatively [40,41].
The fibroblast growth factor (FGF) and its receptor (FGFR) are critical factors in the transformation and tumorigenicity of the RCC [42,43]. The FGFR pathway is complicated in driving tumor angiogenesis independent of the VEGF (vascular endothelial growth factor) to escape VEGF-targeted therapies [44]. The Possible pathological roles of candidate microRNAs in the KIRC and KIRP miR-28 presents contradictory roles in the RCC. In the context of the KIRC, a direct association was found between the elevated level of the miR-28-5p and the aneuploidy state that is induced by VHL depletion or pVHL loss of function. In VHL-associated cancers, upregulated miR-28-5p stimulates chromosomal instability by preventing the translation of the Mad2 [45]. On the other hand, a decreased level of the miR-28-5p was observed in different renal carcinoma cell lines and RCC tumor samples [46,47]. As a tumor suppressor, the miR-28-5p inhibits proliferation and migration of the RCC cells by downregulating the RAP1B, a GTPase, and affecting the activation of the Erk1/2 and p38 MAP kinases [46].
Upregulated levels of the let-7i-5p were connected with high progression or grade/stage of the KIRC [48] and poor prognosis [49]. The let-7i-5p can stimulate the KIRC cells' proliferation, invasion, and migration, by downregulating its target hyaluronan-binding protein 4 (HABP4 or Ki-1/57), a tumor suppressor gene [50]. Moreover, a dysregulated level of the let-7i was observed in plasma exosomes of patients with metastatic renal cancer [51].
By generating ROS (reactive oxygen species) and reducing HIF signaling, mitochondrial proline oxidase induces apoptosis and prevents cell proliferation [53]. In contrast to this study, a decreased expression level of the miR-23b was seen in the KIRC tissue that was associated with pathological stage/grade and a high risk of cancer progression [54]. It is indicated that the cytokine-cytokine receptor interaction pathway was the major pathway controlled by the miR-23b in the RCC.
Colony-stimulating factor 1, epidermal growth factor receptor, MET proto-oncogene, Il-21 receptor, and Chemokine (C-X-C motif) ligand 11 were the main targets of the miR-23b in this pathway [55]. Additionally, an association was reported between prolonged response to sunitinib and elevated levels of the miR-23b in patients with metastatic RCC [56].
The miR-125a is a tumor suppressor that regulates hyaluronic acid synthase 1 and cellular proliferation, apoptosis, and migration by targeting the STAT3 in the RCC [57,58]. The miR-125a-3p expression is activated by the PinX1, a tumor suppressor. The VEGF is a direct target of the mir-125a-3p. miR-125a-3p/VEGF signal pathway is involved in renal cancer angiogenesis [59].
The miR-22 acts differently in tumor progress and is down-regulated in cell lines, serum, and tissues of patients with the RCC [60][61][62]. On the other hand, as a master oncomir, the miR-22 controls the expression of genes related to survival in the KIRC by stimulating cellular invasion [63]. Moreover, the miR-22 impacts the proliferation and migration of KIRC cells by targeting SIRT1, PTEN, CREB1, and PI3K/AKT pathways [62,[64][65][66].
Plenty of articles have studied the impact of the miR-210 in the RCC indicating that miR-210 functions as an oncomir in the RCC [67,68], especially in the KIRP [69]. The results of two metaanalyses indicated that the miR-210 may be a diagnostic biomarker in the RCC [54,70]. As an onco-suppressor, the miR-99b represses mTOR and IGF1R expression to down-regulate mTOR/AKT/IGF1Rsignaling [71]. The low expression level of the miR-99b-5p might correlate with tumor progression in the KIRC patients treated with tyrosine kinase inhibitor [72].
The miR-101 is induced by hypoxia, promoting glycolysis by targeting TIGAR (TP53-induced glycolysis and apoptosis regulator) in the KIRC [73]. As a tumor suppressor, the miR-101-3p inhibits cell proliferation, invasion, and migration of the RCC by targeting the EZH2S, DONSON [74], and oncogenic factors involved in the pathogenesis of the KIRC [75,76]. Moreover, the miR-101 by targeting the UHRF1 suppresses nucleotide excision repair and mismatch repair [77].
Collectively, due to a frequent mutation in protein-coding regions and an elevated burden of unfolded proteins, elevated protein turnover is necessary for those speedily dividing cancer cells; and observed up-regulated proteasome assembling proteins are an adopting mechanism to reach this utmost need. Hence, the inhibition of the UPS components appears to be a hopeful strategy for KIRC therapy.
Our study may open an innovative horizon to investigate the role of the CSN7A, UBAC1, PSMD9, and RNF40 in the pathogenesis of the KIRC. The COPS7A in the KIRC association rules has a high dependency on PSMD9, CAPNS1, and UBAC1, SNU13, respectively.
Although much remains to elucidate the KIRP mechanism, the roles of the NMD3, ZNF41, CFAP36, FGFR1OP2, and RGL1 are of considerable interest. The NMD3 in KIRP has a high dependency on CAPNS1, INTS5 (integrator complex subunit 5), and CFAP36, respectively.